Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction
نویسندگان
چکیده
Value function is the central notion of Reinforcement Learning (RL). estimation, especially with approximation, can be challenging since it involves stochasticity environmental dynamics and reward signals that sparse delayed in some cases. A typical model-free RL algorithm usually estimates values a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking into consideration. In this paper, we propose Decomposition Future Prediction (VDFP), providing an explicit two-step understanding value estimation process: 1) first foresee latent future, 2) then evaluate it. We analytically decompose future part policy-independent trajectory return part, inducing way to model returns separately estimation. Further, derive practical deep algorithm, consisting convolutional learn compact representation past experiences, conditional variational auto-encoder predict convex evaluates representation. experiments, empirically demonstrate effectiveness our approach for both off-policy on-policy several OpenAI Gym continuous control tasks as well few variants reward.
منابع مشابه
Decomposing Parameter Estimation Problems
We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems. Our technique applies to incomplete datasets and exploits variables that are either hidden or observed in the given dataset. We show empirically that the proposed technique can lead to orders-of-magnitude savings in learning time. We explain, analytically and empiricall...
متن کاملLatent Attention For If-Then Program Synthesis
Automatic translation from natural language descriptions into programs is a longstanding challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-toend. Specifically, we introduce Latent Attention, which computes multiplicative weigh...
متن کاملAnalysing value substitution and confidence estimation for value prediction
Value Prediction is one of the newest techniques used to break down ILP limits. Despite being under continuous study during the last few years, a few aspects related to this emerging technique remain unanalysed in depth. Exhaustively investigated in the context of control speculation, confidence estimation has usually played a secondary role on value prediction and speculation. Closely linked t...
متن کاملPrediction Outcome History-Based Confidence Estimation for Load Value Prediction
Load instructions occasionally incur very long latencies that can significantly affect system performance. Load value prediction alleviates this problem by allowing the CPU to speculatively continue processing without having to wait for the slow memory access to complete. Current load value predictors can only correctly predict about forty to seventy percent of the fetched load values. To avoid...
متن کاملOjects under Foresee Uncertainty
Uncertainty appears as a significant barrier to projects attaining their intended performance goals; thereby contri uting to project failure. Literature on project management under uncertainty has recommended a contingency however based on the premise that the level of uncertainty is static over project duration. We relaxed the assumption by considering variation in the level of uncertainty wit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i11.17182